SGI Developer Toolbox 6.1

home *** CD-ROM | disk | FTP | other *** search

/ SGI Developer Toolbox 6.1 / SGI Developer Toolbox 6.1 - Disc 4.iso / src / exampleCode / speech / README < prev

Wrap

Text File | 1994-08-02 | 8.2 KB | 173 lines

~4Dgifts/toolbox/src/exampleCode/speech README new expanding subtree containing software for speech recognition See also the speech Frequently Asked Questions file ~4Dgifts/toolbox/FAQs/netfaqs/speech-faq `!' indicates new or updated as of version 4.2 The capabilities of speech recognition are discrete-utterance, speaker-independent, and small vocabulary. examples: contains [so far] rudimentary speech example programs: * colors.c: speech demo opens large X window and changes colors when the color name is spoken, * recognize: c and c++ versions of the same program that ! inst: contains beta-level inst images of both the execution end development environments for speech recognition; lackey: a speech recognition application example, lackey recognizes speech through the use of the speech recognition library, and uses speech to launch desktop applications; utilities: [so far] contains three binaries--fmbg, gotoWindow, and xrset--useful to srpanel. Read this if you are interested in trying speech recognition (skip near the end for a simple but uninformative example): A version of the speech execution (speech_eoe) and development environment (speech_dev) is available in the inst subdirectory. Indigo (or later) audio capability and Irix 5 is required. This software can match discrete utterances from any speaker against a small pretrained vocabulary. No extra hardware is required (but a better microphone usually helps). A constant 10% of an R4K is used. Currently the only application understanding speech is Showcase. Other apps may be faked into responding to speech by having the speech manager send keystrokes to that app in response to speech. Currently this has been done for only MediaMail/Zmail, Zip/Jot, and 4Dwm. Several others are being experimented with including CASE, the Icon Catalog, other desktop entities, and Jot's electric C mode. Users may add their own actions and words to applications. This software is somewhere between alpha and beta stages, needing at least the following major improvements: * a real character for visual feedback * a complete set of trained vocabularies * integration with more apps (like the desktop) * UI improvements (operations on app word groups) * performance improvements * bug fixes * finished documentation * removal of debug output * placement in the toolchest or icon catalog * a way to deal with audio interference from the computer After installing speech_eoe, you must reboot before srpanel (the speech manager) can run. If you do not do this, srpanel will generate the error message "srpanel: could not connect to server". Make sure your microphone is plugged in and placed somewhere away from your noisy computer (do NOT hold the mic as your breath and hands cause alot of noise). Confirm an increase in apanel's level meter when speaking. Verify the mic is selected as input at 8KHz and set the gain around 7 (this varies between Indigo's & Indy's). See the man pages speech, srpanel, speechbeta, and showspeech (although they are in need of an update). See the troubleshooting section of srpanel's help. After launching srpanel, verify it is hearing you correctly by speaking "go to sleep" and "wake up" and observing srpanel's change in state (when sleeping, srpanel will only recognize "wake up"). When srpanel has focus, all trained words are active but no actions are taken. With focus on Srpanel, verify srpanel recognizes "yes" and "no". If any of "go to sleep", "wake up", "yes" and "no" are not correctly recognized, train them using srpanel's customization window (select the word and click the train button). Speech-aware showcase is invoked with the command showspeech (installed with speech_eoe.sw.misc). Showcase must already be installed. The vocabulary for showspeech is modal, so see the vocabulary section of showspeech's man page to understand what showspeech is expecting to hear. Showspeech is not an approved version of showcase, so don't report any bugs against it to the showcase group. Other apps such as 4Dwm and MediaMail respond to speech on behalf of the speech manager's recognition of a word and subsequent keystroke synthesis (speech-enabled versus speech-aware). Because only keystrokes are communicated to the speech-enabled application, actions in response to speech are limited. You may add your own word-actions to srpanel's customization window, or use the "add from file" menu item to bring in predefined word-actions for some applications. Use MediaMail instead of Zmail (unless you use "zmail" to invoke it) - same for Jot/Zip. See the bindings in the customization window for an understanding of what can be spoken when (the current vocabulary is determined by the class name of the window which has focus). Srpanel may be instructed to respond to speech in various ways. Some keys have multicharacter or symbolic names and are specified inside chevrons such as <escape>. Modifiers such as <alt> are released after a subsequent non-modifier. Key presses and releases may also be controlled. A delay event <delay> may be needed. Srpanel may respond to speech with actions other than keystrokes, such as button presses <B#> and shell commands <!shell command>. Using the shell command feature, there are ways to further manipulate the desktop such as switching desks, warping the pointer, and launching applications. See binaries in the inst location. Only some of the words have been pretrained (none of the words for CASE), so more training *is* necessary. Most the words for 4Dwm's predefined actions have been pretrained, along with a portion for MediaMail/Zmail and Jot/Zip, and only a few for CASE, so further training by the user is currently required to use even the predefined action bindings. Simple but uninformative example for some 4Dwm functionality: as root: # inst -f inst/speech_eoe verify everything is selected (default) and then do inst> go inst> exit then reboot, plug in your mic and set it on your monitor, login as yourself and run % srpanel launch apanel from srpanel's menu "Recognizer -> Audio Control Apanel" verify apanel's input rate at 8KHz, source from mic, and gain at 7 select srpanel's menu "Recognizer -> Customization" select "Customization's menu File -> Add From File" select "4Dwm" from the file browser place focus on any window (except any of srpanel's windows) say "raise window" or "lower window" and verify appropriate response train "yes", "no", "go to sleep", and "wake up" train other commands as necessary see above for more functionality An API document (showcase, no dev man pages yet) is part of speech_dev.sw.misc and installs in /usr/share/data/speech/misc/recog.api. Speech synthesis is technically working on our machines, but we have no plans or deals to ship it, so it is not included or used in the current speech images. Email questions, problems, comments, suggestions to lpw@sgi.com -=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=- Lance Welsh Lance Welsh M/S 01L-875 lpw@sgi.com Silicon Graphics, Inc. wk: (415) 390-1860 PO Box 7311 hm: (415) 322-7225 Mountain View, CA 94039-7311 -=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=-